Document binarisation using Kohonen SOM

نویسنده

  • E. Badekas
چکیده

An integrated system for the binarisation of normal and degraded printed documents for the purpose of visualisation and recognition of text characters is proposed. In degraded documents, where considerable background noise or variation in contrast and illumination exists, there are many pixels that cannot be easily classified as foreground or background pixels. For this reason, it is necessary to perform document binarisation by combining and taking into account the results of a set of binarisation techniques, especially for document pixels that have high vagueness. The proposed binarisation technique takes advantages of the benefits of a set of selected binarisation algorithms by combining their results using a Kohonen selforganising map neural network. In order to improve further the binarisation results, significant improvements are proposed for two of the most powerful document binarisation techniques used, that is for the adaptive logical level technique and for the improvement of integrated function algorithm. The proposed binarisation technique is extensively tested with a variety of degraded documents. Several experimental and comparative results, demonstrating the performance of the proposed technique, are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Som-based Clustering of Textual Documents Using Wordnet

The classification of textual documents has been the subject of many studies. Technologies like the web and numerical libraries facilitated the exponential growth of available documentation. The classification of textual documents is very important since it allows the users to effectively and quickly fly over and understand better the contents of large corpora. Most classification approaches us...

متن کامل

Document Classification using Neural Networks

The paper starts with the need for classification. Then the reasons why neural networks are suitable for document classification are explained. The paper continues with the details of the most commonly used topologically organized network model proposed by Kohonen (1982), referred to as the self-organizing map (SOM). The general idea proposed is to display the contents of a document library by ...

متن کامل

Invited article SOM-based algorithms for qualitative variables

It is well known that the SOM algorithm achieves a clustering of data which can be interpreted as an extension of Principal Component Analysis, because of its topologypreserving property. But the SOM algorithm can only process real-valued data. In previous papers, we have proposed several methods based on the SOM algorithm to analyze categorical data, which is the case in survey data. In this p...

متن کامل

Image Inpainting based on Self-organizing Maps by Using Multi-agent Implementation

The image inpainting is a well-known task of visual editing. However, the efficiency strongly depends on sizes and textural neighborhood of “missing” area. Various methods of image inpainting exist, among which the Kohonen Self-Organizing Map (SOM) network as a mean of unsupervised learning is widely used. The weaknesses of the Kohonen SOM network such as the necessity for tuning of algorithm p...

متن کامل

On the Use of Self-Organizing Map (SOM) in Linguistic Visualization

The availability of linguistic corpora in electronic form has made it possible to make various kinds of computer-based analyses on them. The use of Self-Organizing Map (SOM) in the analysis and visualization of various aspects of textual material is outlined. The basic approach for creating maps of lexical items, documents, or set of languages or dialects is described with reference to original...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007